Hit MOZ_CRASH(bug: this is an unexpected case - please open a bug and talk to #gfx team!) at gfx/wr/webrender/src/spatial_tree.rs:957
Categories
(Core :: Web Painting, defect, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr91 | --- | unaffected |
firefox-esr102 | --- | wontfix |
firefox94 | --- | unaffected |
firefox95 | --- | unaffected |
firefox96 | --- | wontfix |
firefox97 | --- | wontfix |
firefox98 | --- | wontfix |
firefox99 | --- | wontfix |
firefox100 | --- | wontfix |
firefox101 | --- | wontfix |
firefox102 | --- | wontfix |
firefox103 | --- | wontfix |
firefox104 | --- | wontfix |
firefox105 | --- | wontfix |
firefox106 | --- | fixed |
People
(Reporter: tsmith, Unassigned)
References
(Blocks 1 open bug, Regression)
Details
(Keywords: crash, regression, testcase, Whiteboard: [bugmon:bisected,confirmed])
Crash Data
Attachments
(2 files, 1 obsolete file)
Found while fuzzing m-c 20211201-89800efd9e5c (--enable-address-sanitizer --enable-fuzzing)
Hit MOZ_CRASH(bug: this is an unexpected case - please open a bug and talk to #gfx team!) at gfx/wr/webrender/src/spatial_tree.rs:957
#0 0x7f6ca7fcece0 in MOZ_Crash /builds/worker/workspace/obj-build/dist/include/mozilla/Assertions.h:261:3
#1 0x7f6ca7fcece0 in RustMozCrash /gecko/mozglue/static/rust/wrappers.cpp:18:3
#2 0x7f6ca7fcec16 in mozglue_static::panic_hook::h183adc4d73b027cc /gecko/mozglue/static/rust/lib.rs:91:9
#3 0x7f6ca7fcdf95 in core::ops::function::Fn::call::h2c49d8cefb0980e2 /builds/worker/fetches/rust/library/core/src/ops/function.rs:70:5
#4 0x7f6ca9b51077 in std::panicking::rust_panic_with_hook::hd83d5a96a789e1d3 (/home/worker/builds/m-c-20211201050507-fuzzing-asan-opt/libxul.so+0x187bf077)
#5 0x7f6ca6afa2f8 in std::panicking::begin_panic::_$u7b$$u7b$closure$u7d$$u7d$::h5cffcd3c09905dfb /builds/worker/fetches/rust/library/std/src/panicking.rs:544:9
#6 0x7f6ca6acfe09 in std::sys_common::backtrace::__rust_end_short_backtrace::h1a31b5ce359ddaac /builds/worker/fetches/rust/library/std/src/sys_common/backtrace.rs:141:18
#7 0x7f6c95ef5cdc in std::panicking::begin_panic::hf2a51ed4961b214a /builds/worker/fetches/rust/library/std/src/panicking.rs:543:12
#8 0x7f6ca6d4213e in webrender::spatial_tree::SpatialTree::get_relative_transform_with_face::h8626d9d16120c196 /gecko/gfx/wr/webrender/src/spatial_tree.rs:957:9
#9 0x7f6ca6d280b0 in webrender::space::SpaceMapper$LT$F$C$T$GT$::set_target_spatial_node::h9a8da5cafc0dbf05 /gecko/gfx/wr/webrender/src/space.rs:73:29
#10 0x7f6ca6e36661 in webrender::picture::PicturePrimitive::propagate_bounding_rect::ha0ef5ac9c3a7e953 /gecko/gfx/wr/webrender/src/picture.rs:6005:13
#11 0x7f6ca6e4634a in webrender::picture_graph::PictureGraph::propagate_bounding_rects::h50a25e4d0c87eb6d /gecko/gfx/wr/webrender/src/picture_graph.rs:136:17
#12 0x7f6ca6da22b8 in webrender::frame_builder::FrameBuilder::build_layer_screen_rects_and_cull_layers::h2f457917e972846e /gecko/gfx/wr/webrender/src/frame_builder.rs:388:9
#13 0x7f6ca6da22b8 in webrender::frame_builder::FrameBuilder::build::h90317ccdd51e7690 /gecko/gfx/wr/webrender/src/frame_builder.rs:616:9
#14 0x7f6ca6e9ec83 in webrender::render_backend::Document::build_frame::h70345381e5af6802 /gecko/gfx/wr/webrender/src/render_backend.rs:452:25
#15 0x7f6ca6ece913 in webrender::render_backend::RenderBackend::update_document::hdf500b5e3e5a59cc /gecko/gfx/wr/webrender/src/render_backend.rs:1347:41
#16 0x7f6ca6eb9a2f in webrender::render_backend::RenderBackend::prepare_transactions::hd8b6a3269e0c5875 /gecko/gfx/wr/webrender/src/render_backend.rs:1196:28
#17 0x7f6ca6eb9a2f in webrender::render_backend::RenderBackend::process_api_msg::h6b76b71be3315e25 /gecko/gfx/wr/webrender/src/render_backend.rs:1048:17
#18 0x7f6ca6ad3216 in webrender::render_backend::RenderBackend::run::h74d2491579557b24 /gecko/gfx/wr/webrender/src/render_backend.rs:718:21
#19 0x7f6ca6ad3216 in webrender::renderer::Renderer::new::_$u7b$$u7b$closure$u7d$$u7d$::hc32fd3dfe39adfde /gecko/gfx/wr/webrender/src/renderer/mod.rs:1325:13
#20 0x7f6ca6ad3216 in std::sys_common::backtrace::__rust_begin_short_backtrace::h707c1c3cd5edb57f /builds/worker/fetches/rust/library/std/src/sys_common/backtrace.rs:125:18
#21 0x7f6ca6b257fe in std::thread::Builder::spawn_unchecked::_$u7b$$u7b$closure$u7d$$u7d$::_$u7b$$u7b$closure$u7d$$u7d$::hd8f10fa91d3a21d4 /builds/worker/fetches/rust/library/std/src/thread/mod.rs:481:17
#22 0x7f6ca6b257fe in _$LT$core..panic..unwind_safe..AssertUnwindSafe$LT$F$GT$$u20$as$u20$core..ops..function..FnOnce$LT$$LP$$RP$$GT$$GT$::call_once::hb8fb6945e3d0be79 /builds/worker/fetches/rust/library/core/src/panic/unwind_safe.rs:271:9
#23 0x7f6ca6b257fe in std::panicking::try::do_call::h347864793df2690f /builds/worker/fetches/rust/library/std/src/panicking.rs:403:40
#24 0x7f6ca6b257fe in std::panicking::try::he1f3d83b25eca1a7 /builds/worker/fetches/rust/library/std/src/panicking.rs:367:19
#25 0x7f6ca6b257fe in std::panic::catch_unwind::h4f630a3d4d41bb39 /builds/worker/fetches/rust/library/std/src/panic.rs:129:14
#26 0x7f6ca6b257fe in std::thread::Builder::spawn_unchecked::_$u7b$$u7b$closure$u7d$$u7d$::h88c49cf0f29b96f2 /builds/worker/fetches/rust/library/std/src/thread/mod.rs:480:30
#27 0x7f6ca6b257fe in core::ops::function::FnOnce::call_once$u7b$$u7b$vtable.shim$u7d$$u7d$::h6e87c90d8c20f292 /builds/worker/fetches/rust/library/core/src/ops/function.rs:227:5
#28 0x7f6ca9b57bf2 in std::sys::unix::thread::Thread::new::thread_start::h3b1213720f18b702 std.ac3adaa7-cgu.2
#29 0x7f6cb5f08608 in start_thread /build/glibc-eX1tMB/glibc-2.31/nptl/pthread_create.c:477:8
#30 0x7f6cb5ad0292 in __clone /build/glibc-eX1tMB/glibc-2.31/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Reporter | ||
Comment 1•3 years ago
|
||
A Pernosco session is available here: https://pernos.co/debug/-s2ZnW7jpyTVd-SV6Fjjaw/index.html
Updated•3 years ago
|
Updated•3 years ago
|
Comment 2•3 years ago
|
||
Bugmon Analysis
Verified bug as reproducible on mozilla-central 20211201214955-d7af81571379.
The bug appears to have been introduced in the following build range:
Start: 8bee937821e3725b922352a0493f53b5e431c3d0 (20210524213758)
End: 38bfba07a1aca3de3dbf3183e16a9dca26c65c54 (20210525020049)
Pushlog: https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=8bee937821e3725b922352a0493f53b5e431c3d0&tochange=38bfba07a1aca3de3dbf3183e16a9dca26c65c54
Updated•3 years ago
|
Comment 3•3 years ago
|
||
Set release status flags based on info from the regressing bug 1741776
Comment 4•3 years ago
|
||
I suspect I'm building something incorrectly when trying to repro this. If I run a normal m-c build, I can't repro at all, and if I build with a mozconfig with the flags above (--enable-address-sanitizer --enable-fuzzing
) I get a crash on startup:
Program received signal SIGSEGV, Segmentation fault.
0x00005555556d914c in init () at /home/gw/code/work/gecko1/memory/build/mozjemalloc.cpp:4796
4796 static void init() {
Is there something else I need to do in order to build with those flags to try and repro?
Reporter | ||
Comment 5•3 years ago
|
||
(In reply to Glenn Watson [:gw] from comment #4)
Is there something else I need to do in order to build with those flags to try and repro?
Yes, ac_add_options --disable-jemalloc
and likely others, see here. The flags in the description are for bugmon.
Comment 6•3 years ago
|
||
Set release status flags based on info from the regressing bug 1741776
Updated•3 years ago
|
Comment 7•3 years ago
|
||
I got the testcase to crash reliably on a Winx64 Nightly build by:
- Save the testcase to local machine
- Close ALL open nightly windows/tabs. Basically kill firefox
- Double clicking on the testcase so that it opens in Nightly.
https://crash-stats.mozilla.org/report/index/d7688be1-17a9-4257-9ae1-35ae20211209
Comment 8•3 years ago
|
||
NI to follow up on comment 5. Let us know if you need further help with the build. However, based on comment 0 comment 7 I don't think you actually need an ASan build and the difference might be prefs or OS. To confirm, you can also just download the build used for fuzzing from TaskCluster.
Comment 9•3 years ago
|
||
Some bad news and good news from looking at this bug.
Bad: I was able to do a local asan build (thanks!), but I'm unable to repro this locally on current m-c at all, and wasn't able to determine what caused it in the pernosco trace.
Good: When I run this test case in a debug build, I see two asserts in the Gecko code that are likely to be the root cause of this assert when it gets to WR:
The first assert that hits is [1]. If I comment that out, we also hit an assert at [2].
Fixing whatever is causing those asserts in the Gecko code is very likely to fix the Gecko DL in a way that stops this later assert in WR occurring.
Timothy, Miko, are those asserts something that you would know about / be able to take a look at?
[1] https://searchfox.org/mozilla-central/source/gfx/layers/apz/src/HitTestingTreeNode.cpp#374
[2] https://searchfox.org/mozilla-central/source/gfx/layers/apz/src/APZCTreeManager.cpp#1362
Comment 10•3 years ago
|
||
Talked to gw on matrix about this. Those specific apz asserts came in fairly frequently from fuzzing, more than we have time to fix them all. Hopefully someone on the apz team can take a look to determine that it is indeed a bug on the gecko side that we'd like to fix (even if we don't get to fixing it) so that we can have confidence the webrender assert here is only hit when gecko produces an invalid display list (and not when gecko seems to be doing the right thing).
Comment 11•3 years ago
|
||
Here's a WebRenderScrollData dump:
WebRenderLayerScrollData(0x7fc262dbe008), descendantCount=6, visible=[]
WebRenderLayerScrollData(0x7fc262dbe190), descendantCount=5, visible=[]
WebRenderLayerScrollData(0x7fc262dbe938), descendantCount=0, metadata0={ [metrics={ [cb=(x=0, y=0, w=1280, h=887)] [sr=(x=0, y=0, w=1280, h=887)] [s=(0,0)] [dp=(x=0, y=0, w=1280, h=887)] [cdp=(x=0, y=0, w=0, h=0)] [rcs=(1280 x 887)] [v=(x=0, y=0, w=1280, h=887)] [z=(ld=1.000 r=1.000 cr=1 z=1 t=1 )] [u=(0 3)] scrollId=2 [rcd] }] [color=dev_rgba(255, 255, 255, 1.000000)] [overscroll=auto] [1 scrollupdates] }, visible=[]
WebRenderLayerScrollData(0x7fc262dbe318), descendantCount=3, metadata0={ [metrics={ [cb=(x=0, y=0, w=1280, h=887)] [sr=(x=0, y=0, w=1280, h=887)] [s=(0,0)] [dp=(x=0, y=0, w=1280, h=887)] [cdp=(x=0, y=0, w=0, h=0)] [rcs=(1280 x 887)] [v=(x=0, y=0, w=1280, h=887)] [z=(ld=1.000 r=1.000 cr=1 z=1 t=1 )] [u=(0 3)] scrollId=2 [rcd] }] [color=dev_rgba(255, 255, 255, 1.000000)] [overscroll=auto] [1 scrollupdates] }, ancestorTransform=[ 1 0; 0 1; 8 27; ] (asr=2), visible=[]
WebRenderLayerScrollData(0x7fc262dbe4a0), descendantCount=2, visible=[]
WebRenderLayerScrollData(0x7fc262dbe7b0), descendantCount=0, metadata0={ [metrics={ [cb=(x=0, y=0, w=1280, h=887)] [sr=(x=0, y=0, w=1280, h=887)] [s=(0,0)] [dp=(x=0, y=0, w=1280, h=887)] [cdp=(x=0, y=0, w=0, h=0)] [rcs=(1280 x 887)] [v=(x=0, y=0, w=1280, h=887)] [z=(ld=1.000 r=1.000 cr=1 z=1 t=1 )] [u=(0 3)] scrollId=2 [rcd] }] [color=dev_rgba(255, 255, 255, 1.000000)] [overscroll=auto] [1 scrollupdates] }, visible=[]
WebRenderLayerScrollData(0x7fc262dbe628), descendantCount=0, visible=[], fixedContainer=2, fixedAnimation=0x0, sideBits=0x0
We have an ancestor (0x7fc262dbe318) and descendant (0x7fc262dbe7b0) node, both with the RCD metadata.
Comment 12•3 years ago
•
|
||
Attached is a Gecko display list. Having a hard time reasoning about its validity, but certainly the fact that the Opacity item 0x7f048d6b7388 has a null ASR when its ancestor item (nsDisplayTransform p=0x7f048d6b74b0) and some of its children (including CompositorHitTestInfo p=0x7f048d6b6d40) have the RSF ASR, is at the very least suspicious.
Basically, it's a pattern like this:
Item1 ASR=null
Item2(Transform) ASR=RSF
Item3(Opacity) ASR=null
Item4 ASR=RSF
which the WebRenderScrollData building algorithm does not handle well. The Transform is initially deferred, then emitted as its own layer when we emit the Opacity because it has a different ASR. However, since the Transform itself doesn't forceNewLayerData
, it does not contribute its RSF ASR to mAsrStack
, and thus when we emit Item4, the RSF ASR is not on mAsrStack
, making stopAtAsr
the null ASR, and thus resulting in Item4's layer getting the superfluous RSF metadata.
Whether this display list structure is valid is probably a question for Markus, but one option for us to consider is to get the WebRenderScrollData building algorithm to handle it regardless of the answer.
Comment 13•3 years ago
|
||
(In reply to Botond Ballo [:botond] from comment #12)
Whether this display list structure is valid is probably a question for Markus, but one option for us to consider is to get the WebRenderScrollData building algorithm to handle it regardless of the answer.
Note, this is a suggestion for how we could avoid a category of APZ assertions including the one triggered by this testcase. The WebRender assertion on this testcase would likely persist, since that's asserting a property of the WR display list, which would be unaffected by this change.
Comment 14•3 years ago
|
||
(In reply to Botond Ballo [:botond] from comment #13)
Whether this display list structure is valid is probably a question for Markus, but one option for us to consider is to get the WebRenderScrollData building algorithm to handle it regardless of the answer.
I prototyped this idea here, and confirmed that it fixes the APZ assertion but not the WR assertion.
Comment 15•3 years ago
|
||
(In reply to Botond Ballo [:botond] from comment #14)
(In reply to Botond Ballo [:botond] from comment #13)
Whether this display list structure is valid is probably a question for Markus, but one option for us to consider is to get the WebRenderScrollData building algorithm to handle it regardless of the answer.
I prototyped this idea here, and confirmed that it fixes the APZ assertion but not the WR assertion.
I'm going to land this APZ fix in bug 1751789.
I realize my investigation has not directly answered the question asked in comment 10 about whether the display list is valid -- I'm going to have to defer to Markus on that. (Markus -- please see comment 12 for the display list and my analysis of why it's problematic from an APZ point of view.)
Updated•3 years ago
|
Comment 16•3 years ago
|
||
I agree that this display list is not valid. The opacity and the transform are both for the same frame, a TableRowGroup, and they should have the same ASR. Since this is a table row group, the problem might be the way we reparent items for those frames. See bug 1735265 for a similar problem.
Comment 18•3 years ago
|
||
I tried the patch for bug 1735265 locally but unfortunately it does not fix this specific issue. I do not understand this testcase or the relevant code well enough to say anything more about this.
Comment 19•3 years ago
|
||
Moving this to web painting. P3/S3 due to fuzzing testcase that probably does not appear on web (table group with nested animations and fixed pos).
As Markus mentioned, this is related to large table background rework of bug 1409114. It seems that this does not correctly account for clips and ASRs.
Updated•3 years ago
|
Updated•3 years ago
|
Reporter | ||
Updated•3 years ago
|
Updated•3 years ago
|
Reporter | ||
Comment 20•3 years ago
|
||
This issue has become much more frequent in the last week or two, marking as a fuzzblocker. It is now one of our top fuzzing crashes.
Please prioritize it appropriately, thank you.
Comment 21•3 years ago
|
||
(In reply to Tyson Smith [:tsmith] from comment #20)
This issue has become much more frequent in the last week or two, marking as a fuzzblocker. It is now one of our top fuzzing crashes.
Please prioritize it appropriately, thank you.
Hi Tyson, has there been a specific change that made this happen more frequently?
Reporter | ||
Comment 22•3 years ago
•
|
||
The spike seemed to start with m-c 20220518-dd970ebf97df.
Reporter | ||
Comment 23•3 years ago
|
||
So looks like Bug 1769855 or Bug 1749625 may be to blame? I looked at a new test case and it contains backdrop-filter
. Should I log a new issue?
<style>
* {
opacity: 0.258;
transform: rotate(0.128turn) !important;
backdrop-filter: contrast(13%) !important;
border-block-end: dashed lime thick;
}
</style>
Comment 24•3 years ago
|
||
The spike in this case is likely due to https://bugzilla.mozilla.org/show_bug.cgi?id=1769963 - there is a patch in that bug that I'm planning to land today.
Updated•3 years ago
|
Updated•3 years ago
|
Comment 25•3 years ago
|
||
This bug prevents fuzzing from making progress; however, it has low severity. It is important for fuzz blocker bugs to be addressed in a timely manner (see here why?).
:tnikkel, could you increase the severity?
For more information, please visit auto_nag documentation.
Comment 26•3 years ago
|
||
asr asserts are tough to fix without breaking other stuff.
Comment 27•3 years ago
|
||
(In reply to Timothy Nikkel (:tnikkel) from comment #26)
asr asserts are tough to fix without breaking other stuff.
If you are not interested in this assert, can you disable it for fuzzing then?
Comment 28•3 years ago
|
||
Which assert are you hitting? The one in spatial_tree.rs or an asr assert?
Reporter | ||
Updated•3 years ago
|
Reporter | ||
Comment 29•3 years ago
|
||
This seems to have returned to a less frequent report rate (~1 a day). I am removing [fuzzblocker]
from the whiteboard since this is no longer applicable.
Reporter | ||
Comment 30•3 years ago
|
||
Here is an updated testcase that works consistently on Ubuntu 20.04.
Reporter | ||
Comment 31•3 years ago
|
||
(In reply to Timothy Nikkel (:tnikkel) from comment #28)
Which assert are you hitting? The one in spatial_tree.rs or an asr assert?
I am seeing Hit MOZ_CRASH(bug: this is an unexpected case - please open a bug and talk to #gfx team!) at gfx/wr/webrender/src/spatial_tree.rs:973
when running with an ASan build.
I am seeing Assertion failure: false (Two layers that scroll together have different ancestor transforms), at /builds/worker/checkouts/gecko/gfx/layers/apz/src/APZCTreeManager.cpp:1343
when running with a debug build.
Would an updated Pernosco session be helpful?
Comment 32•3 years ago
|
||
(In reply to Tyson Smith [:tsmith] from comment #31)
I am seeing
Assertion failure: false (Two layers that scroll together have different ancestor transforms), at /builds/worker/checkouts/gecko/gfx/layers/apz/src/APZCTreeManager.cpp:1343
when running with a debug build.
There seems to be a nearly infinite supply of those asserts. I'm pretty sure there are open bugs with testcases hitting that assert. We keep fixing them and more testcases keep coming in to fix. They are not the easiest thing to debug and fix, we can't fix every one as they come in but we have fixed a lot of them.
(In reply to Tyson Smith [:tsmith] from comment #31)
I am seeing
Hit MOZ_CRASH(bug: this is an unexpected case - please open a bug and talk to #gfx team!) at gfx/wr/webrender/src/spatial_tree.rs:973
when running with an ASan build.
If you see this assert with a testcase that does not generate any assert in display list code or apz code then make sure to file that and note that. If you get other asserts then that suggests that problem is likely with the "input" to webrender, and the other asserts might be pointing out the root cause.
Reporter | ||
Comment 33•3 years ago
|
||
(In reply to Timothy Nikkel (:tnikkel) from comment #32)
(In reply to Tyson Smith [:tsmith] from comment #31)
I am seeing
Assertion failure: false (Two layers that scroll together have different ancestor transforms), at /builds/worker/checkouts/gecko/gfx/layers/apz/src/APZCTreeManager.cpp:1343
when running with a debug build.There seems to be a nearly infinite supply of those asserts. I'm pretty sure there are open bugs with testcases hitting that assert. We keep fixing them and more testcases keep coming in to fix. They are not the easiest thing to debug and fix, we can't fix every one as they come in but we have fixed a lot of them.
Ha! Ok, if the APZ assertion is not providing value can we disable it while fuzzing as :decoder suggested?
(In reply to Tyson Smith [:tsmith] from comment #31)
I am seeing
Hit MOZ_CRASH(bug: this is an unexpected case - please open a bug and talk to #gfx team!) at gfx/wr/webrender/src/spatial_tree.rs:973
when running with an ASan build.If you see this assert with a testcase that does not generate any assert in display list code or apz code then make sure to file that and note that. If you get other asserts then that suggests that problem is likely with the "input" to webrender, and the other asserts might be pointing out the root cause.
If the apz assertion is disabled would we still see this?
Comment 34•3 years ago
|
||
(In reply to Tyson Smith [:tsmith] from comment #33)
(In reply to Timothy Nikkel (:tnikkel) from comment #32)
(In reply to Tyson Smith [:tsmith] from comment #31)
I am seeing
Assertion failure: false (Two layers that scroll together have different ancestor transforms), at /builds/worker/checkouts/gecko/gfx/layers/apz/src/APZCTreeManager.cpp:1343
when running with a debug build.There seems to be a nearly infinite supply of those asserts. I'm pretty sure there are open bugs with testcases hitting that assert. We keep fixing them and more testcases keep coming in to fix. They are not the easiest thing to debug and fix, we can't fix every one as they come in but we have fixed a lot of them.
Ha! Ok, if the APZ assertion is not providing value can we disable it while fuzzing as :decoder suggested?
Botond's been doing the most work in this area recently, I'd like his thoughts.
(In reply to Tyson Smith [:tsmith] from comment #31)
I am seeing
Hit MOZ_CRASH(bug: this is an unexpected case - please open a bug and talk to #gfx team!) at gfx/wr/webrender/src/spatial_tree.rs:973
when running with an ASan build.If you see this assert with a testcase that does not generate any assert in display list code or apz code then make sure to file that and note that. If you get other asserts then that suggests that problem is likely with the "input" to webrender, and the other asserts might be pointing out the root cause.
If the apz assertion is disabled would we still see this?
Disabling the apz assertion would only increase the chance of seeing the spatial_tree.rs assert (as I would expect the apz assertion to be hit first in some cases before the spatial_tree.rs assert, and I don't think the apz assertion is a requirement for hitting the spatial_tree.rs assert, or said another way I think you can hit the spatial_tree.rs assert without hitting the apz assert).
Comment 35•3 years ago
|
||
(In reply to Timothy Nikkel (:tnikkel) from comment #34)
(In reply to Tyson Smith [:tsmith] from comment #33)
(In reply to Timothy Nikkel (:tnikkel) from comment #32)
(In reply to Tyson Smith [:tsmith] from comment #31)
I am seeing
Assertion failure: false (Two layers that scroll together have different ancestor transforms), at /builds/worker/checkouts/gecko/gfx/layers/apz/src/APZCTreeManager.cpp:1343
when running with a debug build.There seems to be a nearly infinite supply of those asserts. I'm pretty sure there are open bugs with testcases hitting that assert. We keep fixing them and more testcases keep coming in to fix. They are not the easiest thing to debug and fix, we can't fix every one as they come in but we have fixed a lot of them.
Ha! Ok, if the APZ assertion is not providing value can we disable it while fuzzing as :decoder suggested?
Botond's been doing the most work in this area recently, I'd like his thoughts.
Here's a quick survey of the instances of the "Two layers that scroll together" assertion that have been reported by fuzzers since it was re-upgraded to MOZ_ASSERT in bug 1726450:
- Category 1: Cases where the fuzzer report led directly to functional changes that resolved the assertion: bug 1729581, bug 1753779. I think these are clear cases of the assertion providing value.
- Category 2: Cases where the assertion was resolved by functional changes made independently: bug 1734510, bug 1764942. I think these are cases of the assertion potentially providing value (and motivation for investigating fuzzer reports promptly).
- Category 3: Cases where the assertion was resolved by restricting it to not fire in some edge cases: bug 1743731, bug 1771503. These are examples of the assertion not providing value.
As for open reports of this assertion, I could find only bug 1781117, which I haven't had a chance to look at (I've been on PTO for most of the time since its filing) but plan to.
So, on balance, I would lean towards not disabling the "Two layers that scroll together" assert, so long as we can keep up with fixing reports of it firing. (I think the above shows we have a pretty good track record, with 6 fixed and just 1 recently-filed open so far.) If we stop being able to keep up, we can reconsider.
Comment 36•3 years ago
•
|
||
Just wanted to add: I don't think that, for testcases that trip both the "Two layers that scroll together" assert and the spatial_tree.rs assert in the bug title, that fixing the former will necessarily fix the latter.
To explain why, here's a bit of background:
- The "Two layers that scroll together" assertion is asserting properties of the WRScrollData (technically, of the APZ hit-testing tree whose structure corresponds directly to the WRScrollData).
- The spatial_tree.rs assertion is asserting properties of the WebRender display list (WRDL).
- Both the WRScrollData and the WRDL and built from the Gecko DL.
The approach we've been taking recently to fix "Two layers" assertions, is to harden the WRScrollData building step against issues in the Gecko DL (e.g. in bug 1753779). The underlying issues in the Gecko DL, and any downstream consequences on the WRDL, remain.
In principle it would be better to fix the issues at the Gecko DL level, but I don't really have the expertise to do that (I tried in e.g. bug 1729581, with questionable results, e.g. regressing other assertions), and my conversations with people who do (like Markus) have suggested that the long-term plan in this area will be to remove the Gecko DL and build the WRDL (and I guess also the WRScrollData?) directly from the frame tree.
Anyways, my point here is, the question of whether the spatial_tree.rs assert provides value should be evaluated independently of the "Two layers" assert.
Updated•3 years ago
|
Comment 37•3 years ago
|
||
Bugmon Analysis
Testcase crashes using the initial build (mozilla-central 20211201050507-89800efd9e5c) but not with tip (mozilla-central 20220826214835-be22852de7df.)
The bug appears to have been fixed in the following build range:
Start: ed1f1140d8bd7dca6d3d4cd06ff824b1021d2957 (20220825043958)
End: 68ec3db74342eb63fc142f2d28675ad698ba0062 (20220825020420)
Pushlog: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=ed1f1140d8bd7dca6d3d4cd06ff824b1021d2957&tochange=68ec3db74342eb63fc142f2d28675ad698ba0062
tsmith, can you confirm that the above bisection range is responsible for fixing this issue?
Removing bugmon keyword as no further action possible. Please review the bug and re-add the keyword for further analysis.
Comment 38•3 years ago
|
||
From that range it's plausible that bug 1661147 affected this.
Reporter | ||
Comment 39•3 years ago
|
||
The attached test case now triggers bug 1735444 with a debug.
The originally reported issue is no longer reproducible.
Updated•3 years ago
|
Description
•